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Before the SARS outbreak only two human coronaviruses (HCoV) were 
known: HCoV-OC43 and HCoV-229E. With the discovery of SARS-CoV in 
2003, a third family member was identified. Soon thereafter, we described 
the fourth human coronavirus (HCoV-NL63), a virus that has spread 
worldwide and is associated with croup in children. We report here the 
complete genome sequence of two HCoV-NL63 clinical isolates, designated 
Amsterdam 57 and Amsterdam 496. The genomes are 27,538 and 27,550 
nucleotides long, respectively, and share the same genome organization. We 
identified two variable regions, one within the 1a and one within the S gene, 
whereas the 1b and N genes were most conserved. Phylogenetic analysis 
revealed that HCoV-NL63 genomes have a mosaic structure with multiple 
recombination sites. Additionally, employing three different algorithms, we 
assessed the evolutionary rate for the S gene of group Ib coronaviruses to be 
~3x10 * substitutions per site per year. Using this evolutionary rate we 
determined that HCoV-NL63 diverged in the 11th century from its closest 
relative HCoV-229E. 

© 2006 Elsevier Ltd. All rights reserved. 
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Introduction 


genome encodes two polyproteins (1a and lab) that 
contain all proteins necessary for RNA replication. 


Coronaviruses, a genus of the Coronaviridae family, 
are enveloped viruses with a large plus-strand RNA 
genome. The genomic RNA is 27-32 kb in size, 
capped and polyadenylated. Coronaviruses have 
been identified in bats, mice, rats, chickens, turkeys, 
swine, dogs, cats, rabbits, horses, cattle and humans 
and cause highly prevalent diseases such as respira- 
tory, enteric, cardiovascular and neurological 
disorders.'? Originally, coronaviruses were classi- 
fied on the basis of antigenic cross-reactivity in three 
antigenic groups.” When coronavirus genome 
sequence data began to accumulate, the original 
antigenic groups were converted into genetic groups 
based on similarity of the nucleotide sequences. 

The coronaviruses possess a characteristic genome 
composition. The 5’ two-thirds of a coronavirus 


Abbreviations used: aa, amino acid residue(s); RdRp, 
RNA-dependent RNA polymerase; ACE, angiotensin 
converting enzyme. 
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The 3’ one-third of a coronavirus genome encodes 
several structural proteins such as spike (S), enve- 
lope (E), membrane (M) and nucleocapsid (N) 
proteins that, among other functions, participate in 
the budding process and are incorporated into the 
virus particle. Additional accessory protein genes 
are located in the 3’ part of the genome in a 
coronavirus species-specific position. 

HCoV-NL63, a recently discovered member of the 
Coronaviridae family,“ has spread worldwide, is 
observed most frequently in the winter season and is 
associated with acute respiratory illness and croup 
in young children, elderly and immunocompro- 
mised patients.”"'° A recent report suggested that 
HCoV-NL63 is one of the pathogens underlying 
Kawasaki disease,'’ although other studies could 
not confirm this association. '*'* 

HCoV-NL63 belongs to the group I coronaviruses 
according to phylogenetic analyses. The highest 
similarity is observed with HCoV-229E and porcine 
epidemic diarrhoea virus (PEDV), 65% and 61%, 
respectively. Phylogenetic analysis based on gene 
la sequences indicates the presence of diverse 
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HCoV-NL63. strains with distinct molecular 
markers.* The increasing number of HCoV-NL63 
sequences from several locations provides further 
evidence for this genetic diversity and confirms the 
presence of two main genetic clusters.'”'*'® How- 
ever, drawing conclusions based on phylogenetic 
analysis of a single gene sequence and sometimes 
even a partial gene sequence requires caution as the 
true phylogeny can only be demonstrated by 
analyzing complete genome sequences. Full gen- 
ome sequences of field isolates were, however, not 
available. Therefore, we sequenced the complete 
genomes of two HCoV-NL63 field isolates (Amster- 
dam 57 and Amsterdam 496) and genome frag- 
ments of 21 additional field isolates. Here, we 
present evidence for the in-field recombination of 
HCoV-NL63. Furthermore, we characterized the 
molecular variability of HCoV-NL63 isolates to 
have insight into the evolution of the virus. We 
observed high variability at certain genome regions 
and a molecular clock analysis revealed that the 
virus has been present in the human population for 
centuries. 


Results 


HCoV-NL63 isolates 


We sequenced the complete genomes of two 
HCoV-NL63 isolates (Amsterdam 57 and Amster- 
dam 496) directly from patient material. Isolate 496 
was obtained from the throat swab of an eight- 
month old boy (February 2003) and isolate 57 was 
amplified from the bronchoalveolar lavage of a 57- 
year old woman (December 2002), both suffering 
from acute respiratory illness. Both isolates dis- 
played the same basic genome structure as pre- 
viously described for HCoV-NL63. Furthermore, we 
partially sequenced additional isolates directly from 
patient material to analyze the variability of HCoV- 
NL63 (Table 1). The regions were: la gene nt 3004— 
3888 (3K) and nt 5815-6280 (6K), S gene nt 20,497— 
21,003 (21K), ORF3 gene nt 24,521-25,206 (25K), and 
N gene nt 26,136-27,166 (26K). In some cases only a 
few regions were amplified because of the low virus 
load in some patient samples. 


Genetic variability along the genome 


Pair-wise sequence alignments of isolates Amster- 
dam 1,7 NL,”” 57 and 496 demonstrate an overall 
genome similarity of 99.0% between the HCoV- 
NL63 strains. We plotted the frequency of poly- 
morphic nucleotides along the genome to visualize 
variable sites (Figure 1(a)) and identified two 
hypervariable regions, one in the 5’ part of the la 
gene encoding nsp1-nsp3 (nt 170-5000) and in the 5’ 
part of the spike gene (nt 20,300-22,000). The latter 
region encompasses the S1 region that contains a 
unique insert in HCoV-NL63 when compared to its 
closest relative HCoV-229E. 


Table 1. Clinical isolates of HCoV-NL63 


Viral load in 


patient sample Fragments 
Isolate name Sampling date (copies/ml) sequenced 
3 nk. 5.66 10° 3K 
27 n.k. 2.23 x 10° 3K; 21K 
42 n.k. n.d. 26K 
57 08.01.2004 2.00 x 108 Full genome 
63c nk. n.d. 26K 
72 31.12.2002 196305 3K; 6K; 21K; 
25K; 26K 
111 12.01.2004 n.d. 21K 
120 13.01.2004 n.d. 21K 
173 n.k. n.d. 3K 
202 31.03.2003 n.d. 21K 
212 01.02.2003 n.d. 21K; 25K 
223 08.01.2003 n.d. 21K; 25K; 26K 
242 10.02.2003 n.d. 25K; 26K 
246 13.01.2003 1.80 x 10* 6K; 21K; 25K; 
26K 
248 16.01.2003 1.98 x 10° 3K; 6K; 25K; 26K 
251 07.01.2003 8.3410" 25K 
466 04.02.2003 5.36x 10° 3K; 6K; 21K; 
25K; 26K 
496 25.02.2003 1.70 10° Full genome 
705 nk. 3.30 x 10° 3K; 21K 
744 nk. 7.20 x 10° 3K 
791 nk. 3.70 10° 21K 
857 10.12.2003 3.50 10” 3K; 6K; 21K; 
25K; 26K 
890 nk. 7.39 x 10° 3K 


n.k., not known. n.d., not determined. 


Within the variable region 1-5000 nt in the la 
gene, we identified 126 variable sites among the four 
full genome isolates (55 non-synonymous substitu- 
tions), which resulted in 51 variable amino acid (aa) 
positions. In this region, we also identified an in- 
frame deletion of 15 nt in isolate 496 and NL 
(corresponding to nt 3321-3335 of Amsterdam 1). To 
determine the prevalence of this deletion in the virus 
population, we analyzed partial sequences of the la 
gene (region 3K) of additional HCoV-NL63 isolates 
(Table 1) and found it in three more patients (003, 
890 and 248) while in eight patients no deletion was 
observed. The second variable region encompasses 
the S1 part of the S gene. Out of 175 polymorphic 
nucleotides (56 non-synonymous substitutions) 
leading to 51 aa substitutions, 119 are located in 
the first 1200 nt (46 non-synonymous substitutions) 
of the spike gene leading to 41 aa changes. 
Furthermore, we observed a 3 nt deletion within 
the S gene (corresponding to nt 20798 and 20800 of 
Amsterdam 1) in isolates 496, 57 and NL. Sequen- 
cing of 12 additional patient samples identified no 
additional variants with this 3 nt deletion. 

The analysis of synonymous/non-synonymous 
substitutions along the genome indicates that 
synonymous substitutions are generally in excess 
over non-synonymous substitutions (Figure 1(b)). 
To determine if the high variability in the 3K and 
21K regions is driven by positive selection we 
analyzed these regions with PAML software, 
which provides maximum likelihood estimates of 
the extent of positive selection. Likelihood ratio tests 
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Figure 1. Molecular variability of HCoV-NL63 along 
the genome. (a) Frequency of polymorphic sites at the 
nucleotide level among four isolates of HCoV-NL63. (b) 
Frequency of polymorphic sites on the synonymous and 
non-synonymous positions among four isolates of HCoV- 
NL63. The analysis was done with a 500 nt non- 
overlapping window. 


were used to assess whether a model, which 
included positive selection, was significantly better 
than one that did not. When positive selection was 
indicated, empirical Bayes’ methods were used to 
identify which individual sites were under positive 
selection. According to the PAML analysis the 3K 
and 21K regions showed no significant sign of 
positive selection. 

We analyzed the most conserved genome regions 
to identify suitable targets for the development of a 
PCR-based diagnostic assay that can detect all 
HCoV-NL63 isolates. The 1b polyprotein gene is 
highly conserved, with 25 variable nucleotides and 
only one aa substitution in the region nt 12401- 
15195 among the four isolates. This region encodes 
the RNA-dependent RNA polymerase (RdRp). The 
second most conserved region is the N protein and 
we confirmed the homogeneity in nine additional 
patient isolates. Of the 24 variable positions scored 
in a 1031 nt region, only five were non-silent and 
resulted in 4 aa changes. Although it was previously 
mentioned that the ORF3 gene is highly variable in 
strain NL,'’ we observed a low heterogeneity in 
Amsterdam 1, 57 and 496. Also among 11 patient 
isolates, only ten polymorphic nucleotides were 


observed (one non-synonymous substitution), 
resulting in only 1 aa change in the patient isolates. 
A 3 nt insertion and an additional 1 aa change were 
observed only in the cultured NL isolate.'” 


HCoV-NL63 recombination 


We analyzed the full genome sequence of the four 
HCoV-NL63 isolates for possible recombination 
events. As the explorative bootscan analysis was 
not suggestive due to the low number of highly 
similar sequences and a stochastic noise that could 
not be distinguished from the real signal, we 
decided to analyze only the regions showing high 
number of informative sites. We analyzed the partial 
sequences of five regions (3K, 6K, 21K, 25K, and 
26K) for 57, 496, NL and Amsterdam 1 isolates and 
nine additional patient isolates. The phylogenetic 
analysis confirms that in the 3K region the isolates 
Amsterdam 1 and 57 do cluster together in one 
subgroup while NL and 496 are located in the 
second (Figures 2 and 3). Phylogenetic analysis of 
the 6K region of several isolates reveals that the 
clustering pattern changes, with isolates NL, 496 
and Amsterdam 1 in one subgroup and isolate 57 
being an outgroup (Figures 2 and 3). In the 21K 
region the analysis shows that Amsterdam 1 is a 
single representative of one cluster, while NL, 496 
and 57 are tightly clustering in the second group 
(Figures 2 and 3). Therefore, there is clear evidence 
that HCoV-NL63 isolates are mosaics with multiple 
recombinations along the genome. 

Because in several regions the sequence was 
highly conserved and the analysis did not show 
signs of the presence of two genetic clusters it was 
very difficult to identify the exact location of the 
recombination spots. The S gene does contain 
enough informative sites, and we identified two 
spots of recombination: one between positions nt 
21,072 and 21,161 and the second between positions 
nt 21,662 and 21,884 in the Amsterdam 1 genome 
(Figure 4). 


Interspecies recombination 


We also analyzed whether recombination within 
the coronavirus family can be identified. The 
analysis was performed with the SimPlot software 
by plotting the similarity between different mem- 
bers of the Coronaviridae family as well as by 
scanning the genome with the bootscan tool. 
Along the genome the similarity between HCoV- 
NL63 and HCoV-229E is the highest, except for one 
part of the M gene. The similarity graph shows that 
the 3’ region of the M gene has a higher nucleotide 
similarity to PEDV than to HCoV-229E (Figure 5(a)). 
Additionally, the bootscan analysis suggests recom- 
bination between an ancestral HCoV-NL63 strain 
and PEDV in that region (Figure 5(b)). To rule out 
that the observed effect is the result of convergent 
evolution we analyzed the synonymous substitution 
pattern between HCoV-NL63 and HCoV-229E. It has 
been described that the synonymous substitution 
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Figure 2. Discordance in phylogenetic clustering of different isolates of HCoV-NL63 at regions 3K, 6K and 21K. 
Phylogenetic trees were constructed as described in Materials and Methods using an UPGMA algorithm. The scale bar 
unit represents a 0.002 substitution per site. The trees were rooted with the sequences of it’s closest relative: for 3K and 6K 
the HCoV-229E and for the 21K region the PEDV. Four completely sequenced isolates are marked with colored boxes, 


rate is increased at genome regions, which originated 
from another species and may be used as a marker of 

Fi 18,19 
gene transfer from another species. Indeed a rise 
in the synonymous substitution rate in the 3’ region 
of the M gene was observed (Figure 5(c)). 


Molecular clock analysis 


Based on the nucleotide sequence coding for the S 
protein (nt 20,649-22,269), a maximum-likelihood 
phylogenetic tree was constructed for HCoV-NL63 
and several HCoV-229E strains for which the date of 
isolation was known (three isolates from year 1967, 
five isolates from year 1999 and six isolates from year 
2000). Based on these sequence data, the evolution- 
ary rate of HCoV-229E was calculated by Bayesian 
coalescent approach,”° serial ML estimate?!*? and 
sUPGMA** approaches. The evolutionary rates 
estimated with these three approaches were of very 
similar magnitude and were 3.2810 * (95% con- 
fidence interval, 1.72 x 10 *to5.00x10 *), 6.1710 * 
and 2.82 x10 * (95% confidence interval, 1.3610 *— 
4.4210 *) substitutions per site per year, respec- 
tively. Assuming a constant evolutionary rate in time 
and between the branches for HCoV-NL63 and 
HCoV229E, the time to the most recent common 
ancestor (TMRCA) of HCoV-NL63 and HCoV-229E 


was dated by the Bayesian coalescent approach 
around the year 1053 (95% highest posterior density 
interval, year 966 to 1142). This estimate was highly 
consistent under different demographic models, 
including an exponential growth (TMRCA around 
year 1105 (95% confidence interval, 1017 to 1188)), 
and expansion growth (TMRCA around year 1124 
(95% confidence interval, 1038 to 1206)). A likelihood 
ratio test indicated that the molecular clock hypoth- 
esis could not be rejected (P=0.05). We also 
attempted to date back the split of two HCoV- 
NL63 lineages, but due to several recombination 
spots in the spike gene region that we sequenced (see 
also Figure 4) this analysis was not possible. The 3K 
and 6K fragments could not be used because we only 
know the substitution rate for the 20,649-22,269 
region in the HCoV-NL63 genome. 


Discussion 


Homologous recombination is well known_for 
several RNA viruses, including coronaviruses.7*° A 
“copy-choice” mechanism has been proposed, in 
which the RdRp, together with the nascent RNA 
strand, dissociates from the original template and re- 
associates at the same position on another template 
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Figure 3. Discordance in clustering of different isolates of HCoV-NL63 at regions 3K, 6K and 21K. Three alignments of 
only variable sites subtracted from the original sequence with DnaSP 4.0 software, shows the discordance in clustering at 
different regions of the genome. Groups A and B were created arbitrarily to show the discordance. Group A was defined 


subsequently recommencing RNA synthesis.” 
Recombination of coronavirus genomes has been 
observed in vitro in cell culture,***° in experimen- 
tally infected animals,”” and in embryonated eggs.”* 
In the case of infectious bronchitis virus, there is 


evidence for homologous recombination occurring 
in the field.*”~°* We present evidence that recombi- 
nation has occurred during the evolution of HCoV- 
NL63 and that viral isolates possess a mosaic geno- 
me structure. Recombination was discovered by full 
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Figure 4. Identification of recom- 
bination sites in the S gene. The 
alignment includes only subtracted 
variable sites. These variable sites 
were subtracted with DnaSP 4.0 
software. The change of color repre- 
sents the alternation of genetic 
clustering between isolates. The 
numbers at the top represent the 
beginning of the S gene (nt 20,472 in 
the Amsterdam 1 isolate), coordi- 
nates of recombination spots inside 
the S gene (nt 21,061-21,072 and 
21,575-21,576 in the Amsterdam 1 
isolate) and the 3’ terminus of the S 
gene (nt 24,542 in the Amsterdam 1 
isolate). 


Molecular Evolution of HCoV-NL63 


969 


60 


50 


Similarity (%) 


b 
Oo 


w 
oO 


20 


so 


"300.400. 500° 600 700 
Position (bp) 


10 —— 
100 200 


% of Permuted Trees 
|] NWA AD 
oo o0°co0dc Oo 


Oo 


0 100 200 300 400 500 600 700 
Position (bp) 
(c) 
0.8 
0.75 
0.7 
20.65 
oO 
_ 0.6 
20.55 
© 
5 05 
£ 


0.4 
0.35 } 
03 SSO et tet teal 
0 100 200 300 400 500 600 700 
— synonymous position in M gene [nt] 
“= non-synonymous 


Figure 5. Signs of interspecies recombination in the M 
gene. Similarity plot (a) and bootscan analysis with the 
Kimura (two-parameter) distance model, neighbor-joining 
tree model and 500 bootstrap replicates (b) of the HCoV- 
NL63 M protein. (c) Analysis of the substitution pattern on 
the synonymous and non-synonymous level between the 
M gene of HCoV-NL63 and HCoV-229E. The window 
used was 40 nucleotides with a step of one nucleotide. 


genome sequence analysis of HCoV-NL63 variants 
from clinical samples. Analysis of several genome 
regions showed discordance in the phylogenetic 
clustering along the genome, a clear sign of recom- 
bination. Because the majority of informative sites 
are located at non-coding positions one can exclude 


that related genome sequences were the result of 
convergent evolution due to positive selection. 

HCoV-NL63 is the causative agent of up to 10% 
of all respiratory illnesses.47-1015°9%¢ This high 
prevalence obviously increases the possibility of a 
recombination event through co-infection with 
another human or zoonotically transmitted animal 
coronavirus. Thus, recombination might enable 
highly pathogenic recombinant virus variants to 
arise. There is some evidence for recombination 
between PEDV and an ancestral HCoV-NL63 strain. 
Whereas HCoV-NL63 is mostly similar to HCoV- 
229E, a part of the HCoV-NL63 M gene shows the 
highest similarity to PEDV, suggesting a possible 
interspecies recombination event. The M protein is a 
key player in virus assembly and budding, thereby 
interacting with other structural viral proteins. The 
M protein of coronaviruses has been shown to span 
the virion membrane three or four times. The N,xo- 
Cendo topology is adopted by most coronaviral M 
proteins, but it has been proposed that the M protein 
of TGEV is present in the viral envelope in two 
topological states, Nexo-Cendo and Nastas’ ThE 
sequence analysis with the TMHMM v 2.0 predic- 
tion software? suggests a similar scenario for HCoV- 
NL63 (data not shown). Also the M protein of PEDV, 
but notably not HCoV-229E, shares these character- 
istics, which suggests that this domain of the M gene 
is obtained by interspecies recombination. 

Coronaviruses are well equipped to adapt rapidly 
to changing ecological niches by the high substitu- 
tion rate of their RNA genome. The average 
substitution rate for this family was estimated to 
be 10 * substitutions per year per site.*°°? HCoV- 
NL63 is a member of the group I coronaviruses, with 
highest similarity to HCoV-229E and PEDV. These 
three species cluster with a recently described bat 
coronavirus (BatCoV, strain 61)*” in subgroup Ib. 
Our efforts to establish the substitution rate of 
HCoV-NL63 failed, as there is not enough sequence 
data available from isolates of past years. For this 
reason we decided to calculate the substitution rate 
for HCoV-229E, using partial sequences of the S 
gene from different dates. Based on this substitution 
rate, we dated the divergence time of HCoV-229E 
and HCoV-NL63 to the 11th century. The reliability 
of molecular dating is dependent on the validity of 
the molecular clock hypothesis, which assumes that 
the substitution rate is roughly constant. A max- 
imum likelihood test confirmed that the molecular 
clock hypothesis is suitable for the coronavirus data 
set investigated here. 

We propose that around 900 years ago the HCoV- 
229E and HCoV-NL63 viruses started to evolve from 
a common ancestor into the direction of separate 
species. For HCoV-OC43 it has been estimated that it 
emerged at the end of the 19th century or the 
beginning of the 20th century, implying that it has 
only been around for 100 years.*!4? The SARS-CoV 
was introduced in humans in 2002, and for HCoV- 
HKU1, this has not been investigated. The 
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heterogeneity of HCoV-HKU1, which appears in 
two separate genotypes and a third genotype that is 
a recombinant of these two genotypes,*°+ suggests 
that this virus was not recently introduced into the 
human population, similar to the situation of HCoV- 
NL63. The divergence of HCoV-NL63 and HCoV- 
229E was followed by a separation of HCoV-NL63 
into two lineages, of which we suspect that this 
occurred at geographically distinct locations. Sub- 
sequently the two lineages recombined during co- 
infection, illustrated by the fact that all four full 
genome sequences available thus far display a 
mosaic genome organization. 

The variability of coronaviruses has not been 
studied thoroughly. Although there are several 
reports concerning SARS-CoV, the relatively short 
time that this virus was present in the human 
population makes a long term study impossible. The 
first variable region of HCoV-NL63 encodes the 
three proteins nsp1—nsp3. The biological function of 
nspl and nsp2 proteins is thought to be linked to 
virus replication.***° The nsp3 protein is a corona- 
viral papain-like proteinase (PLP"°) that is expected 
to be a multifunctional protein with several domains 
that mediate various enzymatic activities.*”** Thus, 
the high variability of this region might influence the 
viral replication and interaction with cellular pro- 
teins. The second variable region is located in the 5’ 
part of the S gene. This region contains 24% of all 
polymorphic nucleotides, whereas it encompasses 
only ~4% of the genome. The coronavirus S protein 
is an important determinant for the host cell 
specificity and tissue tropism, which is largely 
determined by the distribution of its receptor. 
Recently, Hofmann et al. reported that HCoV-NL63 
uses the angiotensin converting enzyme 2 (ACE2) 
molecule as a receptor. The interaction of NL63-S 
with ACE2 is surprising, as HCoV-NL63 is closely 
related to HCoV-229E, which uses CD13 as a 
receptor. Furthermore, NL63-S shares no appreci- 
able amino acid identity with SARS-CoV-S, which 
does use ACE2. The amino acid sequence of the 
CD13-binding site in 229E-S is 57% conserved in 
NL63-S, whereas the alignment of the ACE2-bind- 
ing site of SARS-CoV-S with NL63-S reveals only 
14% aa identity. These data suggest that NL63-S and 
SARS-CoV-S might have evolved different strategies 
to interact with ACE2. The receptor binding domain 
of NL63-S protein resides in the S1 region.” Thus, 
the variability that we observed may alter the ACE2- 
binding properties of the S protein or, alternatively, 
the binding to a co-receptor. 

Besides the two hypervariable regions, we also 
identified regions with a remarkably low substitu- 
tion rate. The 1b gene is extremely conserved, most 
prominently in the region encoding RdRp (nt 12416- 
15195). Analysis of the ORF3 gene shows high 
conservation of this gene, unlike what has been 
reported for HCoV-NL63.'” This suggests a vital 
function of the ORF3 protein during natural infec- 
tion. Further investigations on the ORF3 protein 
function is needed to determine its real biological 
relevance. 


The observation of recombination within the 
HCoV-NL63 group indicates that two lineages, 
identified in previous reports in the 1a gene, cannot 
be treated as separate lineages. Characterization and 
typing of currently circulating strains should be 
performed with at least two assays, based on se- 
quences derived from hypervariable regions 1-6000 
nt and 20,000-21,000. On the contrary, a sensitive 
diagnostic assay for detection of HCoV-NL63 should 
be designed in the regions with highest stability 
such as the 1b or N gene. 


Materials and Methods 


Patient isolates 


HCoV-NL63 positive patients were identified by a 
diagnostic nested RT-PCR as described* or by a real time 
PCR that was performed with primers NF (5’ GCGTGTT- 
CCTACCAGAGAGGA 3’) and NR (5’ GCTGTGGA- 
AAACCTTTGGCA 3’), and HCoV-NL63 was detected 
with probe NP (5’ FAM-ATGTTATTCAGTGCTTTG 
GTCCTCGTGAT-TAMRA 3’) as described.’* A total of 
23 NL63-positive patients were identified within the 
Academic Medical Center, Amsterdam. Two HCoV-NL63 
isolates were selected for full genome sequencing (57 and 
496) and several genome fragments were sequenced for the 
other 21 isolates (Table 1). Sampling dates and patient 
characteristics are summarized in Table 1. Sequencing was 
performed on an ABI 3700 machine (Perkin-Elmer Applied 
Biosystems) using the BigDye terminator cycle sequencing 
kit (version 1.1). Chromatogram sequence files were 
inspected and assembled with CodonCode 1.4 and further 
corrected manually. Several genomic regions were ampli- 
fied using the following primers. For the 1a gene: sense 5’ 
GGTCACTATGTAGTTTATGATG 3’ and sense 5’ GG- 
ATTTTTCA TAACCACTTAC 3’; antisense 5’ CTT- 
TIGATAACGGTCACTATG 3’ and antisense 5’ CTCA 
TTACATAAAACATCAAACGG 3’. For the S gene: sense 5’ 
GGTTGTTGTTACGCAATAAT GGTCGT 3’; antisense 5’ 
ACACGGCCATTATGTGTGGT 3’. For ORF3: sense 5’ 
ATTGTT TAACTTCATCAATGC 3’; antisense 5’ CCA- 
TAAAATGGAATTGAGGACAATAC 3’. For N: sense 5’ 
CTCTCAGGAGGGTGTTTTGTCAGAAAG 3’; antisense 
5’ ATAATAAACATTCA ACTGGAATITA C 3’. 


RT-PCR and full genome sequencing 


The cDNA used for sequencing was generated with 
MMLV-RT, 1 pg of random hexamer DNA primers, in 
10 mM Tris (pH 8.3), 50 mM KCI, 0.1% (v/v) Triton-X100, 
6 mM MgCl, and 50 pM dNTPs at 37 °C for 90 min. The 
cDNA was converted into double-stranded DNA in a 
standard PCR reaction with 1.25 units of Taq polymerase 
(Perkin-Elmer) per reaction and appropriate primers. Full 
genome sequencing of the two field isolates of HCoV- 
NL63 was performed with single round RI-PCR as 
described above, with a set of overlapping PCR products 
(average size 700 bp) encompassing the entire genome. 
Primers were designed in regions that are conserved 
between the Amsterdam 1 and NL isolates of HCoV-NL63. 
Primer sequences used for full genome sequencing are 
available on request. The 5’ and 3’-terminal sequence were 
determined by 5’ RACE (Invitrogen) and 3’ RACE as 
described.* Each PCR fragment was sequenced on both 
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strands and the virus isolates were amplified and 
sequenced on separate dates to prevent sample contam- 
ination. Each experiment contained negative extraction 
controls. Sequencing was performed as described above. 


Sequence analysis 


Multiple sequence alignments were prepared with 
ClustalX 1.83 and manually edited in BioEdit. Phyloge- 
netic analyses were conducted using MEGA, version 3.1. 
Bootscan and similarity graphs were prepared with 
SimPlot 2.5 software.”’ Analysis of HCoV-NL63 varia- 
bility and synonymous and non-synonymous substitu- 
tions was done with DnaSP 4.0 software. Positive selection 
was analyzed with PAML 3.14 software.°' The analysis is 
according to the codon-based evolution models (one-ratio, 
neutral and selection models) developed by Nielsen and 
Yang,” which allows the ratio of synonymous and non- 
synonymous substitution rates to vary among amino acid 
positions. This method uses dg and dy to denote the rates 
of synonymous and non-synonymous substitutions, 
respectively. Their ratio reflects the selection intensity at 
the amino acid level. 


Molecular clock analysis 


Evolutionary rates were estimated using three app- 
roaches: Bayesian inference in BEAST, version 1.2 (kindly 
made available by A. J. Drummond and A. Rambaut, 
University of Oxfordt) and serial ML estimate and 
sUPGMA with PEBBLE 1.0.7’ Divergence times were 
estimated using Bayesian inference in BEAST, version 
1.2}. The Markov chain Monte Carlo (MCMC) length was 
10° with a sample frequency of 10° and effective sample 
size of 9x10*. The burn-in was 10’. Convergence to 
stationarity was investigated using the Tracer 1.3 MCMC 
trace analysis tool. The molecular clock hypothesis was 
tested by the likelihood ratio test. 


Nucleotide sequence accession numbers 


The sequence of HCoV-NL63 isolate Amsterdam 57 and 
Amsterdam 496 described here were deposited in Gen- 
Bank under accession numbers: DQ445911 and DQ445912, 
respectively. The partial sequences of patient isolates were 
deposited in GenBank under accession numbers 
DQ462752-DQ462792. The GenBank accession number of 
HCoV-NL63 isolate Amsterdam-1 is NC_005831, isolate 
NL is AY518894, HCoV-229E is AF304460, and PEDV 
(strain CV777) is AF353511. The GenBank accession 
numbers of the 6K region of isolates 072, 246, 248 and 
466 are AY567494, AY567489, AY567493 and AY567488, 
respectively. 
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